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FOB  E  WORD 


Tables  and  graphs  based  on  the  hypergeometrie  distribution  are 
presented  for  use  in  deternnining  the  confidence  interval  of  the  sample 
estimate  of  the  number  of  defectives  in  a  finite  population.  Similarly, 
the  sample  size  can  be  determined  which  would  give  a  certain  quality 
level  as  the  lower  bound  for  a  selected  confidence  level.  The  hyper* 
geometric  distribution  is  particularly  suited  for  small  populations 
(less  than  1,  000)  where  a  saving  in  the  sample  size  is  desired  even  at 
the  expense  of  some  loss  in  precision  of  the  estimate. 

The  tables  of  point  and  cumulative  probabilities  are  tabulations 
of  selected  sample  and  population  combinations.  The  selected  sample 
eizes  range  from  4  to  40  and  the  population,  from  50  to  1,  000. 

For  those  that  have  access  to  an  IBM  1401  Model  B*4,  8K  memory 
the  computer  program  is  included  as  Appendix  D. 

The  authors  wish  to  express  their  appreciation  to  A.  Ohta  and 
K.  Thornton  for  editing  and  assembly  of  the  tables  and  to  J.  Mitchell  for 
supervising  the  computer  tabulation. 
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INTRODUCTION 


in  quality  evaluation,  the  basic  question  is:  "What  is  the  quality 
level  of  the  stockpile  in  question?"  More  often  this  question  is  put  in 
the  following  form:  "How  large  must  the  sample  be  to  give  a  certain 
level  a£  assurance  that  the  stockpile  is  no  worse  than  X%  defective  if 
no  defectives  are  observed  in  the  sample?"  In  this  latter  form,  the  re> 
quirement  is  not  for  a  precise  estimate  of  the  stockpile  quality  level  but 
rather  some  assuran-^e  that  the  quality  is  not  below  a  specified  level.  In 
this  situation,  the  implication  is  that  there  is  some  willingness  to  sacri¬ 
fice  some  precision  if  a  reduction  in  the  sample  size  required  can  be 
realized. 

Most  approaches  until  recently  have  been  based  on  the  binomial 
distribution.  But  in  cases  where  the  stockpile  Is  small,  say  less  than 
500,  and  the  unit  item  cost  high,  the  binomial  has  not  been  a  very  satis¬ 
factory  model.  As  is  nornaally  the  case  in  quality  evaluation  where  the 
populations  are  small  and  sampling  is  without  re]|^cement,  it  appeared 
that  the  hypergeometric  distribution  was  the  more  realistic  model  to 
use,  but  until  the  advent  of  the  modern-day  computer,  the  formidable 
task  of  calculating  the  probabilities  on  a  desk  calculator  prevented  its 
use. 


Of  primary  concern  to  the  Oahu  LAboratory  is  coping  with  small 
stockpiles  of  high  unit  cost  weapons.  What  is  desired  ia  a  method  where¬ 
by  stockpiles  of  extremely  high  or  low  quality  (percent  operability)  can 
be  readily  detected  using  a  minimal  size  sample.  For  stockpiles  falling 
in  between,  additional  samples  must  be  tested  if  greater  precision  in  the 
quality  estimates  is  desired.  As  a  result,  a  study  was  made  of  a  two- 
stage  sampling  method  based  on  the  hypergeometric  distribution  and 
using  95%  operability  at  the  90%  confidence  level  as  the  lower  bound  for 
"good"  stockpiles. 


APPROACH  TO  THE  PROBLEM 


Based  on  the  hypergeometric  distribution  two  main  mathematical 
approaches  are  proposed. 


Approach  I 


The  first  approach  may  be  stated  in  this  mathematical  form 


p(d|s,n.m) 


r  M!  "1  r  im-m)! _ I] 

LJM-D)!  IJN-M-S+D)!  {S-D)2J 


(N-S)!  S! 


where:  S  =  Sample  size 

N  ■  Population  size 
D  ■  Sample  defectives 
M  ■  Population  defectives 

and  obviously 

S  'D  a  Sample  Non-defectives 
N-M  a  Population  Non-defectives 

Equation  (1)  states  that  the  probability  of  obtaining  O  defectives 
in  a  sample  of  size  S,  given  M  defectives  in  a  population  of  N  items,  is 
equal  to  the  number  of  ways  of  drawing  D  out  of  M  items  times  the  num¬ 
ber  of  ways  of  drawing  S-D  out  of  N-M  items  divided  by  the  number  of 
ways  of  drawing  S  out  of  N  items.  In  stockpile  quality  estimation  it  is 
desired  to  find  the  confidence  interval  for  the  number  of  defectives  in  a 
finite  population.  Since  M  is  not  known,  an  upper  bound  on  the  true  M 
is  sought.  Call  this  bound  Mq.  First  assume  that  M„  •  Mj.  Then  look 
up  in  an  appropriate  table  (reference  (4))  the  sum  of  the  probabilities  of 
drawing  D  or  less  defectives  in  the  sam^e.  If  this  sum  is  less  thandg, 
the  significance  level  (e.  g.  .  10),  the  proper  My  should  be  less  than  Mj. 
Then  choose  My  ■  M2,  M2  <  M|  and  repeat  the  above  procedure  until 

F(Dils,N,My-l)  <o6<  !§  P(Di|s,N,Ma) 

Di«0  ^  Dj-O  * 


Then  select  My-1  or  My  as  the  upper  bound  depending  on  which  corres¬ 
ponding  sum  is  clossr  tooC. 
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The  maximum  likelihood  value  is  given  as  M  ^  {N+1) 
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(reference  1,  p.  294)  or  more  completely  (N+l)  -1  <  -^  (N+l) 

s  s 

(reference  3«  p.  3).  References  for  this  approach  are  found  in  (1),  (2). 
and  (3). 

Approach  II 


The  approach  followed  is  to  assume  that  populations  with  M 
defectives  where  Kt  ranges  from  D  to  (N>S4D)  are  tested  randomly.  In 
this  case  P(d|s.N,M|)  is  related  to  P93|s.N,Mq),  P(D|Sf  N, 

....  P(D|s,  N,  The  probability  of  the  observed  sample  coming 

from  a  population  with  defectives  is  given  as: 


P(Mi|  D.S.N)  - 


P(d|S.N.  Mj) 
N-S40  ^ 

2  p(d|s 

O 


where,  D  i  Mj  <  N-S+D 


S.N.Kf) 


Equation  (2)'  states  that  if,  from  a  population  of  size  N,  a  sample 
S  is  drawn  and  D  defectives  are  observed,  the  probability  of  M|  defectives 
in  the  population  is  eqtial  to  the  ratio  of  the  probability  of  that  set  of  M| , 

N,  S,  and  O  to  the  total  sets  of  N,  Mj,  S,  D  where  Mj  is  allowed  to  range 
from  O  to  (N'S'tf)).  The  number  of  defectives  in  the  *  population  can¬ 
not  be  less  than  the  number  of  defectives  observed  in  the  sample  nor 
greater  than  the  difference  between  the  total  population  N,  and  ^-D) 
(sample  non-defectives). 

N-8+D 

The  denominator  2  P(d|s,N,  M,)  can  be  shown  equal  to 

Mj  «  D  ^  *■*“* 

a  constant.  This  makes  it  valid  to  tabulate  P(D  S,  N,  M)  instead  of 
P(M  S,N,D)  for  checking  purposes.  In  the  form  of  equation  (1), 
equation  (2)  becomes  the  derived  equation: 


P(M  I  0,S.N)  s 


N-M> 

S-D 


P(D|s,N,AfO 


in- 


I 

For  brevity,  let  P(M£  |d,S,N)  ■  P(M).  Then  these  recurrence 
relationships  are  very  useful  for  computational  purposes: 


(4) 


P(M) 

P{M+1) 


(M-D+n  W-KO 
(M+1)  (N-M-S+O) 


(5) 

P(M-l) 


I 

(M)  (N-M-S+D+1) 

(M-O)  (N-M-t-l) 


From  (4)  and  (5)  it  can  be  seen  that  P(M)  >  P(M+1),  where  they 
exist,  as  long  as  (Kl>D4l)  (N-M)>(Xf4l)  (N-M^S4D)  and  similarly 
P(M)>P(M-1)  as  long  as  (Kt)  (N-M>S4D4l)  >  (M-D)  (N-M41).  It  follows 

A 

that  the  maximum  likelihood  integer  M  may  be  expressed  as: 

(6)  -2L  (N4l)  - 1  <  M  <  -^  (N41) 

S  S 


A 

There  will  be  two  M  values  where  the  extreme  right  and  left 
expressions  are:  (a)  integers  sod  (b)  exist.  Since  equations  (4),  (5)  and 
(6)  show  that  the  probabilities  decrease  from  the  maximum  likelihood 

A 

value,  serial  computations  should  start  with  M  as  shown  below  in 

equations (74  and(7b).  (Note:  M  differs  from  that  in  |Reference  1^  and 
^Reference  3j.) 

(7a)  P(M)2P(M-1)^P(M-2)  > - P(M-k) 

(7b)  pA  2.  P(M4l)  P(M42)  2. . . . .  P(il!f4i1 
T 

(7c)  ST  P(Mj)  *  1  ~OC 


If  P(M4i)  >  P(M-k),  P(M4i)  is  added  to  Sp(Mj)  which  sUrte 
with  P(M):  otherwise  P(M>k)  is  added.  Equation  (7c)  states  that  R  to  T 
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is  the  range  of  M  defectives  in  the  population  when  *he  sum  of  all  the 
higher  probabilities  is  equal  to  l-cCt  the  confidence  level. 


JL 

(8)  2  =  I-OC  . 

D 

L  being  determined  by  computations.  Equation  (8)  gives  the  upper  hound 
of  Mj  or  the  so-called  one -tail  test  for  small  values  of  D.  For  D  :  0  and 
D  ■  S,  the  one -tail  and  two-tail  tests  coincide.  For  most  values  of  D, 
the  probabilities  form  an  asymmetric  as  well  as  a  discrete  distribution. 
The  asymmetries  are  illustrated  in  figure  1  where  curved  lines  are 
drawn  through  point  probabilities  for  N  b  50,  S  >  8,  and  D  ■  0  to  4. 

Approaches  I  and  II  may  perhaps  be  better  illustrated  by  the  use 
of  black  and  white  missiles. 


Approach  I 

6  6  (•) 

6  6  4 

4 — 4  w 

6  4  4  <°* 

There  are  six  missiles  in  the  population  -  four  white  and  two 
black.  The  three  possible  outcomes  in  a  sample  of  three  are  labeled  (a), 
(b),  and  (c);  the  chance  or  probability  of  drawing  any  one  is  determined 
by  equation  (1). 

Approach  II .  In  stockpile  quality  evaluation  it  is  desirable  to 
take  approach  11  which  is  the  more  realistic  statistical  model.  In  this 
case  the  population  is  estimated  from  a  known  sample. 

Cl  6  6  i  <■« 

^  ^  ^  6  Q  6  6  i  4  '•> 

^  ^  a  A  *  t  *  » 

A  A  *  *  *  «  « 
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Here,  from  a  population  of  six,  is  a  sample  of  three  wherein  two 
white  and  one  black  missiles  were  noted.  The  four  possible  populations 
from  which  this  sample  could  have  been  drawn  are:  (d),  (e),  (f),  and 
(g).  The  chances  or  probability  of  any  one  of  the  populations  being  the 
one  from  which  this  particular  sample  is  drawn  is  determined  by 
equation  (3). 

Symmetries 

Since  P(d|s,N.  M)  »  P{m|s.N,D)  by  equation  (3).  sym¬ 

metries  given  by  Lieberman  and  Owen  (Reference  4)  for  P(d)s,N,M) 
apply  in  many  instances  where  S  and  N  are  fixed. 

(9)  P(M  =  Ml  |dj^S,N)  »  P(M=  N-Mjl  S,S-Di,  N) 

(10)  ^  P(m|s,D  s  Di,  N)  -  ^  P(m|s,D>S-Di,  N) 

M  -  A  M  -  N-A 

Equations  (9)  and  (10)  show  that  tables  need  only  involve  half  the 
sample  size. 

Population  defect  ivea  M  given  and  sample  size  S  unknown 

(11)  p(m|d,s,n)  -  p(s|d,m.n) 

N-M40  N41 

(12)  S  p(s|d,m,n)  «  — - 

s  ■  D 


Equations  (11)  and  (12)  give  the  basic  equations  for  the  problem 
of  sample  size  estimation  when  N,  D,  and  M  are  known.  Any  table  for 
M  defectives  may  be  used  by  interchanging  S  for  M. 


Two  Stage  Sampling 

Out  of  a  total  allowable  sample  of  size  S  from  a  population  N,  an 
initial  sample  Si  is  tested  and  Ox  defectives  are  found.  By  means  of  a 


table  similar  to  Appendix  C.  it  is  found  that  x  defectives  gives  the 
desired  percent  operability  and  y  defectives  (y>>'.)  do  not.  If  Dj  <.x 
then  the  lot  is  accepted.  If  Dj  >y,  the  lot  is  rejected.  If  x<Dj  <y, 
then  the  remainder  of  S,  called  S2>  ie  sequentially  tested..  If  at  any 
time  a  total  of  Di  4-C  defectives  are  found,  the  lot  is  rejected,  since  the 
total  sample  S  would  have  at  least  -t-C  defectives.  If  the  total  sample  S 
is  tested  and  (D^  +C-1)  defectives  are  found,  the  lot  is  accepted.  Dj  -t-C 
is  determined  from  the  probability  table  for  N,  S,  and  M,  the  number  of 
defectives  that  will  be  tolerated. 

Best  Sample  Siae  S  for  a  Given  D 

For  some  values  of  O  sample  defectives,  a  large  sample  size  S 
is  required  to  reach  the  desired  percent  operability  for  a  given  confidence 
level.  In  these  cases,  the  percent  operability  should  perhaps  be  lowered 
to  the  point  where  the  additional  sample  units  give  less  than  some  pre¬ 
selected  gain  value  in  percent  operability.  This  is  illustrated  in  figure  2, 
which  is  patterned  after  the  graphs  in  Appendix  B. 


FIGURE  2.  SAMPLE  SIZE  VERSIS  OPERABILITY. 


Here  for  D  ■  a^  an  additional  sample  of  10,  from  30  to  40  will 
result  in  a  very  small  gain  in  percent  operability  and  therefore,  S  ■  30 
may  be  the  better  choice  of  sample  size. 
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RESULTS 
Tableg  and  Graphs 

The  table  and  graphs  in  Appendices  A  and  B  are  useful  as  "quick 
look"  references.  The  table  gives  the  range  for  population  defectives  for 
certain  fixed  confidence  levels.  The  graphs  give  different  popula^on  lines 
in  terms  of  percent  operability  versus  fixed  sample  sizes.  Smooth  cuaves 
are  drawn  through  interpolated  points.  Only. the  lower  percent  bperability 
values  are  plo'tted  for  clarity. 

Print -out  of  Probabilities 


A  sample  of  the  computer  print-out  of  probabilities  is  given  in 
Appendix  C.  Point  probabilities  and  sums  for  P(D|SrNf  M)  were  tabulated 
instead  of  P(m|d,S,N)  for  purposes  of  checking  with  other  tables. 

The  abbreviations  are: 


S  ■  Sample  size 
N  ■  Population  size 
O  ■  Sample  defective 
M  *  Population  defective 

P  •  p(d|s,n.m) 

T  I 

SUM  •  S  p(d|s,n,m) 
M-R 


where  the  sum  contains  the  highest 
probabilities  from  R  to  T 


CONT  •  S  P(d|s,N,M) 
R 


p(d|s.n,m) 


M«D 


INTERVAL  ■ S 


/Lower  M 
Lower  M 

- Ff - 

Upper  M 


N 

LEFT  SUM  -  Sum  of  decreasing  probabilities  to  the  left  of  maximum 
likelihood 

RIGHT  SUM  *  Sum  of  decreasing  probabilities  to  the  right  of  maximum 
likelihood 


A  one-tail  test  is  possible  from  this  type  of  table.  Assume  1  -o^ 

N-S-IO  , 

is  the  confidence  level,  Q  *  ^  P(D]S,  M),  and 

hf«D 


N-S-t-D  , 

R«  5^  p(d|s,n.m). 

A 

M-M-l-1 


Compute  oCQ  and  subtract  from  R.  Then  trace  back  in  the  "Right  sum" 
column  until  a  value  just  exceeding  R  -  o^Q  is  found.  The  value  of 
upper  M  in  the  same  row  is  the  one-tail  upper  bound  on  the  population 
defectives. 


Program  for  the  Hypergeometric  Series 

The  symbolic  language  program  developed  for  the  IBM  1401 
computer.  Mod  B4,  is  given  in  Appendix  D.  Accuracy  in  computation  of 
factorials  was  mainly  accomplished  by  having  the  decision  to  multiply  by 
a  number  from  the  numerator  or  divide  by  a  number  from  the  denominator 
depend  upon  the  number  of  leading  zeros  resulting  from  the  previous 
calculation.  Individual  probabilities  were  then  usually  accurate  to  tea 
places  and  sums  of  prol»bilities  to  ei^t. 

The  abbreviations  used  in  the  print-out  in  Appendix  O  are; 


PG  LIN  ■ 

CT  * 

OP  s 

A  OPERAND  • 
B  OPERAND  « 
D  > 


Page  and  line  identificatioa 

Count  for  instruction  or  reserved  storage 

Operation  instruction 

A  or  I  address  of  instruction 

B  address  of  instruction 

D  character  modification  of  the  basic  instruction 
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From:  Commanding  Officer 

To:  Chief,  Bureau  of  Naval  Weapons  (FQ>1) 

Department  of  the  Navy 
Washington  25,  D.C. 

Subj:  Tables  of  the  Hypergeometric  Distribution  Functions 

End:  (1)  Mathematical  derivation  of  the  tables 

(2)  Tables  of  point  and  cumulative  probabilities  for  various 
sample  and  population  size  combinations 

1.  Enclosures  (1)  and  (2)  are  forwarded  for  publication  by  the  U.  S. 
Government  Printing  Office.  Enclosure  (1)  contains  mathematical 
derivation  of  the  hypergeometric  distribution  function,  tables,  graphs 

and  IBM  1401  computer  program.  Enclosure  (2)  (forwarded  under  separate 
cover)  contains  two  copies  of  the  tabulations  of  the  hypergeometric 
probabilities. 

2.  The  tables,  which  include  point  and  cumulative  probabilities  are 
designed  primarily  for  use  by  personnel  familiar  with  statistics  to 
estimate  stockpile  quality  level. 

3.  Copy  addressees  are  advised  that  copies  of  the  tabular  presentation, 
enclosure  (2),  which  are  quite  voluminous  cannot  be  made  available  by 
this  command.  It  is  presumed  that  they  will  be  generally  available  through 
the  Bureau  of  Naval  Weapons  or  the  Government  Printing  Office  if  the. 
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